A (very) short summary of the report. As an example, you can simply write one sentence for each section in the report.
The scale of complexity a a human brain has over that of a mouse is well known among neuroscientists. The structure of a mouse brain is very similar in structure and function to a human brain, thus making them an interest of study in neurological testing. In this analysis, we look at data from a recent experiment by Steinmetz et al. (2019) which observed the activity of neurons in the visual cortex of mice when presented with stimuli. Following this, the mice were required to make a decision, using a wheel which was controlled by their forepaws. Their decision foretold whether they recieved an award or a penalty.
The data set we will be using only looks at two of the ten mice in the study, Cori and Forssmann from mid-December 2016 to early November 2017. Both mice had varying number of trials, with 39 tracked times for each trial. Since these five sessions have varying amounts of trials, we look for a way to minimize the data from these five sessions into one long data set. To do this, we first must create an accurate response variable for our model, one that we can compare imbalanced trials between sessions. We choose to compare the mean firing rates for each trial in each session. This measure would be the average number of spikes per seconds across all neurons measuring within a 0.4 second interval. The choice of 0.4 seconds as our interval is by design of the experiment, which focused on spike trains of neurons from the presenting of the stimuli to 0.4 seconds after. The resulting data frame, from the five merged sessions, is shown below. We have a data set of 1196 trials, spanning across the 5 sessions, with the mean firing rate, the left and right contrasts, feedback type, and an ID column for which session the trial came from.
ses1 <- readRDS("./Data/session1.rds")
ses2 <- readRDS("./Data/session2.rds")
ses3 <- readRDS("./Data/session3.rds")
ses4 <- readRDS("./Data/session4.rds")
ses5 <- readRDS("./Data/session5.rds")
session=list()
for(i in 1:5){
session[[i]]=readRDS(paste('./Data/session',i,'.rds',sep=''))
#print(session[[i]]$mouse_name)
#print(session[[i]]$date_exp)
}
# id=11
# session[[5]]$feedback_type[id]
# session[[5]]$contrast_left[id]
# session[[5]]$contrast_right[id]
# length(session[[1]]$time[[id]])
# dim(session[[5]]$spks[[id]])
#ID=1
t=0.4 # from Background
# Rows of contrast is quantity of trials (varying)
# rows of spikes/times is quantity of neurons (varying)
# columns of spikes/times is tracked sessions of brain (fixed)
#
# n.trials=length(session[[ID]]$spks)
# # Number of nuerons is the rows of spikes. which varies
# n.neurons=dim(session[[ID]]$spks[[1]])[1]
#
# # Obtain the firing rate
# firingrate=numeric(n.trials)
# for(i in 1:n.trials){
# firingrate[i]=sum(session[[ID]]$spks[[i]])/n.neurons/t
# }
# firingrate
firingrate <- c()
total.fire.rate <- sapply(1:5, function(i)
sapply(1:length(session[[i]]$spks), function(j)
firingrate = c(firingrate, sum(session[[i]]$spks[[j]]) / dim(session[[i]]$spks[[j]])[1] / t)))
# Flatten the list of lists
rates.flat <- (unlist(total.fire.rate))
mice <- data.frame("Firing.Rate" = rates.flat)
# Now create dataframe to join with session as a variable as well
contrast.dat <- data.frame()
abc <- sapply(1:5, function(i)
cbind(
session[[i]]$contrast_left,
session[[i]]$contrast_right,
session[[i]]$feedback_type,
rep(i, length(session[[i]]$contrast_left))
))
contrast.dat <- do.call(rbind, abc)
# Merge data.frames
mice <- cbind(mice, contrast.dat)
colnames(mice) <-
c("Firing.Rate", "C.Left", "C.Right", "Feedback_Type", "Session")
# Convert mice Session into factor
cols2fac <- c("C.Left", "C.Right", "Feedback_Type", "Session")
mice[cols2fac] <- lapply(mice[cols2fac], factor)
mice
Before we begin our analysis, we will report some summary statistics of the data as well as visualizations to understand our data with more depth. Below we print a five number summary of our response, the mean firing rate. We can see from the table that the lowest mean firing rate is 0.404 and the largest is 7.219. However, with a median of 2.962, we speculate the distribution of our mean firing rate, which we will plot below colored by Feedback Type and Session number.
kable(as.data.frame(c(summary(mice$Firing.Rate))))
| c(summary(mice$Firing.Rate)) | |
|---|---|
| Min. | 0.4040404 |
| 1st Qu. | 1.9166667 |
| Median | 2.9620075 |
| Mean | 2.8579374 |
| 3rd Qu. | 3.6913696 |
| Max. | 7.2191011 |
# Create Histograms,
# separated by Session
ggplotly(
ggplot(mice, aes(x = Firing.Rate, fill = Session)) + geom_histogram(bins = 22, alpha = 0.8) + labs(
x = "Firing Rate",
y = "Count",
fill = "Session",
title = "Histograms of Firing Rates Colored by Session"
)
)
# Separated by Feedback Type
ggplotly(
ggplot(mice, aes(x = Firing.Rate, fill = Feedback_Type)) + geom_histogram(
position = "dodge",
bins = 15,
alpha = 0.9
) + scale_fill_manual(values = c("#56B4E9", "#E69F00", "#999999")) + labs(
x = "Firing Rate",
y = "Count",
fill = "Feedback Type",
title = "Histograms of Firing Rates Colored by Feedback Type"
)
)
Select the variables you find relevant based
on your understanding in the Background section. Summarize univariate
descriptive statistics for the selected variables (mean, standard
deviations, missing values, quantiles, etc.). You can create the table
using functions in base R, or use packages (see, e.g., this
note).
# Separated by Feedback Type
ggplotly(ggplot(mice, aes(x = Firing.Rate, fill = Feedback_Type)) + geom_histogram(position = "identity", bins = 22, alpha = 0.5) + scale_fill_manual(values=c("#E69F00","#999999", "#56B4E9")))
# Box Plots
# For Left Contrast
#ggplot(mice, aes(x = C.Left, y= Firing.Rate, fill = C.Left)) + geom_boxplot()
# Right Contrast
#ggplot(mice, aes(x = C.Right, y= Firing.Rate, fill = C.Right)) + geom_boxplot()
Our goal is to build a mixed effects model for the firing rates of neurons with fixed effects contrast left \(\alpha_i\), contrast right \(\beta_j\), their interaction \(\alpha\beta_{ij}\), and the random effect \(\gamma_k\) of each session. We have the following model:
\[ Y_{ijkl} = \mu + \alpha_i + \beta_j + \alpha\beta_{ij} + \gamma_k + \epsilon_{ijkl} \] where we have:\[ i = \{0, 0.25, 0.5, 1\} \\ j = \{0, 0.25, 0.5, 1\} \\ k = \{1 \ldots5\} \\ l = \{1 \ldots n_k\} \] With this model, we have a few constraints,\(\sum_i n_i\alpha_i = 0\), \(\sum_j n_j\beta_j = 0\), \(\sum_{ij} n_{ij}\alpha\beta_{ij} = 0\), \(\gamma \sim N(0, \sigma_{\gamma}^2)\), and \(\epsilon \sim N(0,\sigma^2)\). With fixed effects of both the left and right contrasts, we aim to find how they affect the mean firing rate for all neurons. However, because we are comparing these neuron mean firing rates across multiple sessions, we must include a random effect measure to account for all non-fixed effects that take place in switching sessions. This random effect, as well as our error, are both normally distributed.
We now build our model, using lme4 library, which will allow us to create the intercept \(\gamma_k\) for each session in the model. Since we will be performing model comparisons between fixed effects, we will abstain from using the restricted maximum likelihood estimation (Oehlert 4). From the model,
lm1 = lmer(Firing.Rate ~ C.Left * C.Right + (1 |Session),
data = mice,
REML = F)
# plot(lm1) save for sensitivity analysis
sum.lm1 <- summary(lm1)
kable(sum.lm1$coefficients)
| Estimate | Std. Error | t value | |
|---|---|---|---|
| (Intercept) | 2.6440266 | 0.4514576 | 5.8566440 |
| C.Left0.25 | 0.1230864 | 0.1151461 | 1.0689583 |
| C.Left0.5 | 0.3297409 | 0.0774725 | 4.2562308 |
| C.Left1 | 0.4181565 | 0.0790000 | 5.2931230 |
| C.Right0.25 | 0.3294402 | 0.0957057 | 3.4422197 |
| C.Right0.5 | 0.3104181 | 0.0771486 | 4.0236369 |
| C.Right1 | 0.4755159 | 0.0655214 | 7.2574085 |
| C.Left0.25:C.Right0.25 | -0.2867070 | 0.1861141 | -1.5404908 |
| C.Left0.5:C.Right0.25 | -0.3270568 | 0.1543957 | -2.1183028 |
| C.Left1:C.Right0.25 | -0.4207931 | 0.1394203 | -3.0181631 |
| C.Left0.25:C.Right0.5 | -0.2243095 | 0.1666453 | -1.3460297 |
| C.Left0.5:C.Right0.5 | -0.2801156 | 0.1520929 | -1.8417404 |
| C.Left1:C.Right0.5 | -0.3443762 | 0.1486681 | -2.3164097 |
| C.Left0.25:C.Right1 | -0.3084436 | 0.1447033 | -2.1315595 |
| C.Left0.5:C.Right1 | -0.1159386 | 0.1418182 | -0.8175153 |
| C.Left1:C.Right1 | -0.1903870 | 0.1450161 | -1.3128678 |
ranef(lm1)
## $Session
## (Intercept)
## 1 1.2128530
## 2 0.4034316
## 3 0.6952204
## 4 -0.7896509
## 5 -1.5218540
##
## with conditional variances for "Session"
#predict(lm1, )
# Now build a model with no interactions
lm2 = lmer(Firing.Rate ~ C.Left + C.Right + (1 |Session),
data = mice,
REML = F)
summary(lm2)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: Firing.Rate ~ C.Left + C.Right + (1 | Session)
## Data: mice
##
## AIC BIC logLik deviance df.resid
## 2349.3 2395.1 -1165.7 2331.3 1187
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.8457 -0.7083 -0.1206 0.6508 5.2257
##
## Random effects:
## Groups Name Variance Std.Dev.
## Session (Intercept) 1.0189 1.0094
## Residual 0.4003 0.6327
## Number of obs: 1196, groups: Session, 5
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 2.69990 0.45252 5.966
## C.Left0.25 -0.06272 0.05531 -1.134
## C.Left0.5 0.21943 0.05318 4.126
## C.Left1 0.24160 0.05155 4.687
## C.Right0.25 0.12001 0.05538 2.167
## C.Right0.5 0.18721 0.05419 3.455
## C.Right1 0.38039 0.04820 7.892
##
## Correlation of Fixed Effects:
## (Intr) C.L0.2 C.L0.5 C.Lft1 C.R0.2 C.R0.5
## C.Left0.25 -0.018
## C.Left0.5 -0.026 0.256
## C.Left1 -0.025 0.271 0.286
## C.Right0.25 -0.023 -0.156 -0.124 -0.249
## C.Right0.5 -0.028 -0.152 -0.051 -0.074 0.289
## C.Right1 -0.032 -0.252 -0.027 -0.030 0.313 0.332
anova(lm1, lm2)
# From the likelihood ratio test we can see that with a p-value of 0.05, we reject the null, that there is no difference in the models, therefore we keep the full model
### PERTAINING TO THE RANDOM EFFECT TESTING
# The 3d Scatter will help visualize the difference between the random effects
# Set up data by predictions for 3d Scatter (not including heatmap, not as useful)
pred.mice <-
expand.grid(x = unique(mice$C.Left), y = unique(mice$C.Right)) # All permutations of contrasts
pred.mice <- pred.mice %>% slice(rep(row_number(), 5))
pred.mice <-
cbind(pred.mice, c(sapply(1:5, function (x)
rep(x, 16)))) # combine with session numbers
colnames(pred.mice) <- c("C.Left", "C.Right", "Session")
pred.mice <-
pred.mice %>% mutate_if(is.numeric, as.factor) # convert to factors
pred.mice <-
cbind(pred.mice, predict(lm1, pred.mice)) # add predictions to df
colnames(pred.mice) <-
c("C.Left", "C.Right", "Session", "Pred") # add column names
# ggplotly(ggplot(pred.mice, aes(
# x = C.Left, y = C.Right, fill = Pred
# )) + geom_tile() + facet_wrap( ~ Session, ncol = 2))
# Lets try with plotly scatter
l <- list(
font = list(
family = "sans-serif",
size = 12,
color = "#000"
),
bgcolor = "#E2E2E2",
bordercolor = "#FFFFFF",
borderwidth = 2
)
plot_ly(
x = pred.mice$C.Left,
y = pred.mice$C.Right,
z = pred.mice$Pred,
type = "scatter3d",
mode = "markers",
color = pred.mice$Session
) %>% layout(
scene = list(
xaxis = list(title = "Left Contrast"),
yaxis = list(title = "Right Contrast"),
zaxis = list(title = "Mean Firing Rate")
),
legend = list(title = list(text = '<b> Session </b>'))
) %>% layout(legend = l)
# Test for model with no random effect
lm.noRando <- lm(Firing.Rate ~ C.Left * C.Right,
data = mice)
anova(lm.noRando)
# Likelihood ratio test
anova(lm1, lm.noRando)
# Random effect has very small p-value, so we also reject and continue with our full mixed effect model
# Before checking Normality and Homoscedasticity, lets check for outliers
cookD.lm1 <- cooks.distance(lm1)
row.ind <- seq(1, nrow(mice), 1)
ggplotly(
ggplot() + aes(x = row.ind, y = cookD.lm1) + geom_point() + labs(y = "Cooks Distance", x = "Observation Number", title = "Plot of Cooks Distance vs Observation Number for Mixed Effect Model")
)
ggplotly(
ggplot() + aes(x = row.ind, y = sum.lm1$residuals) + geom_point() + labs(y = "Cooks Distance", x = "Observation Number", title = "Plot of Residuals vs Observation Number for Mixed Effect Model")
)
# From the plot we can see that observations 10, 806, and 3 have the highest Cook's distance, lets see what stands out
# 44 has the largest residual value
mice[c(3,10,44, 806),]
# Observation 10 has a firing rate of 6.66, when the average firing rate is around 4, Observation 3 is also a session 1 with a high firing rate, however it is not as drastic as observation 10
# 44 is an extremely high mean firing rate for session 1
# Observation 806 has an abnormally high firing rate for the left and right contrast it has in its session, the predicted value for a cleft 0.25 and cright 0 is 2.183 where observation 806 is 3.85 firing rate
# We will remove these rows, rebuild the model, and check assumptions
mice.red <- mice%>%
filter(!row_number() %in% c(3, 10,44, 806))
# build model with outliers removed
lm1.red = lmer(Firing.Rate ~ C.Left * C.Right + (1 |Session),
data = mice.red,
REML = F)
sum.lm1.red <- summary(lm1.red)
# Continue with sensitivity analysis
# Plot a histogram of the residuals
ggplotly(
ggplot() + aes(x = sum.lm1.red$residuals) + geom_histogram(bins = 20, fill = "#E69F00") + theme_igray() + labs(x = "Pearson Residuals", y = "Count", title = "Distribution of Residuals for Full Mixed Effect Model")
)
# Shapiro.test
shapiro.test(sum.lm1.red$residuals)
##
## Shapiro-Wilk normality test
##
## data: sum.lm1.red$residuals
## W = 0.99007, p-value = 3.287e-07
ggplotly(
ggplot() + aes(x = predict(lm1.red), y = sum.lm1.red$residuals) + geom_point() + theme_igray() + stat_spline(
spar = 0.95,
size = 0.8,
color = "#cd534cff"
) + labs(x = "Fitted Values", y = "Pearson Residuals", title = "Pearson Residuals vs. Fitted Values for the Final Model")
)
head(mice.red)
leveneTest(Firing.Rate ~ C.Left*C.Right, data = mice.red)
# We can see from our Levene Test and the fitted values vs residuals the assumption is met
# We now focus on building a logistic regression model for Feedback type, we will start with full data
mice.logit <-
glm(Feedback_Type ~ C.Left + C.Right + Session,
data = mice,
family = "binomial")
summary(mice.logit)
##
## Call:
## glm(formula = Feedback_Type ~ C.Left + C.Right + Session, family = "binomial",
## data = mice)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.7091 -1.3261 0.7633 0.9667 1.2175
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.15575 0.17778 6.501 7.98e-11 ***
## C.Left0.25 -0.32590 0.18147 -1.796 0.072508 .
## C.Left0.5 -0.26937 0.17758 -1.517 0.129299
## C.Left1 -0.08745 0.17429 -0.502 0.615826
## C.Right0.25 -0.74861 0.18307 -4.089 4.33e-05 ***
## C.Right0.5 -0.63755 0.18018 -3.538 0.000402 ***
## C.Right1 -0.47784 0.16297 -2.932 0.003368 **
## Session2 -0.17510 0.19906 -0.880 0.379063
## Session3 -0.07145 0.20489 -0.349 0.727290
## Session4 0.04070 0.20132 0.202 0.839789
## Session5 0.02000 0.19940 0.100 0.920085
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1539.1 on 1195 degrees of freedom
## Residual deviance: 1504.2 on 1185 degrees of freedom
## AIC: 1526.2
##
## Number of Fisher Scoring iterations: 4
Conclude your analysis in this section. You can touch on the following topics.
By default, it is assumed that you have discussed this project with your teammates and instructors. List any other people that you have discussed this project with.
List any references you cited in the report. See here for the APA format, as an example:
Imbens, G., & Rubin, D. (2015). Stratified Randomized Experiments. In Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction (pp. 187-218). Cambridge: Cambridge University Press. doi:10.1017/CBO9781139025751.010
Report information of your R
session for reproducibility.
sessionInfo()
## R version 4.0.5 (2021-03-31)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 22621)
##
## Matrix products: default
##
## locale:
## [1] LC_COLLATE=English_United States.1252
## [2] LC_CTYPE=English_United States.1252
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C
## [5] LC_TIME=English_United States.1252
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] car_3.0-12 carData_3.0-5 ggformula_0.10.1 ggridges_0.5.3
## [5] scales_1.1.1 ggstance_0.3.5 ggthemes_4.2.4 dplyr_1.0.8
## [9] knitr_1.39 lme4_1.1-28 Matrix_1.3-2 plotly_4.10.0
## [13] ggplot2_3.3.5
##
## loaded via a namespace (and not attached):
## [1] httr_1.4.2 sass_0.4.1 tidyr_1.2.0 jsonlite_1.8.0
## [5] viridisLite_0.4.0 splines_4.0.5 bslib_0.3.1 assertthat_0.2.1
## [9] highr_0.9 yaml_2.2.1 pillar_1.7.0 lattice_0.20-41
## [13] glue_1.4.2 digest_0.6.29 RColorBrewer_1.1-2 polyclip_1.10-0
## [17] minqa_1.2.4 colorspace_2.0-3 htmltools_0.5.2 plyr_1.8.6
## [21] pkgconfig_2.0.3 labelled_2.8.0 haven_2.5.0 purrr_0.3.4
## [25] tweenr_1.0.2 ggforce_0.3.3 tibble_3.1.6 generics_0.1.2
## [29] farver_2.1.0 ellipsis_0.3.2 withr_2.5.0 lazyeval_0.2.2
## [33] cli_3.1.1 magrittr_2.0.1 crayon_1.5.0 evaluate_0.15
## [37] fansi_1.0.2 nlme_3.1-152 MASS_7.3-53.1 forcats_0.5.1
## [41] tools_4.0.5 data.table_1.14.2 hms_1.0.0 lifecycle_1.0.1
## [45] stringr_1.4.0 munsell_0.5.0 compiler_4.0.5 jquerylib_0.1.4
## [49] rlang_1.0.2 grid_4.0.5 nloptr_2.0.0 rstudioapi_0.13
## [53] htmlwidgets_1.5.3 crosstalk_1.1.1 mosaicCore_0.9.0 labeling_0.4.2
## [57] rmarkdown_2.14 boot_1.3-27 gtable_0.3.0 abind_1.4-5
## [61] DBI_1.1.2 R6_2.5.1 fastmap_1.1.0 utf8_1.2.2
## [65] stringi_1.7.6 Rcpp_1.0.8 vctrs_0.3.8 tidyselect_1.1.2
## [69] xfun_0.30